亚图同构或子图匹配通常被认为是NP完整问题,在边缘权重采用真实值并受到测量噪声和可能的异常情况的实际应用中变得更加复杂。据我们所知,几乎所有子图匹配方法都利用节点标签执行节点节点匹配。在没有此类标签的情况下(在诸如图像匹配和映射匹配之类的应用中),这些子图匹配方法不起作用。我们提出了一种方法,可以在不精确的情况下识别子图和完整图之间的节点对应关系,而没有节点标签,分为两个步骤 - (a)从子图中提取最小的唯一拓扑保留子集,并在完整的图中找到其可行的匹配, (b)实现基于共识的算法来扩展匹配的节点设置,通过基于边界交换性配对唯一的路径。除了现有的子图匹配方法之外,所提出的方法显示出具有现实的亚线性计算效率,对随机测量噪声的鲁棒性和良好的统计特性。我们的方法也很容易适用于确切的匹配情况,而不会丧失通用性。为了证明该方法的有效性,分别对ERDOS-RENYI随机图和基于图像的仿射协变功能数据集进行了模拟和案例研究。
translated by 谷歌翻译
在本文中,我们认为由于专家的昂贵的像素级注释以及大量未经发布的正常和异常图像扫描,近年来近年来引起了近年来越来越多的注意力的问题。我们介绍了一个分割网络,该分割网络利用对抗学习将图像分成两种切割,其中一个落入用户提供的参考分布。这种基于对抗的选择性切割网络(ASC-Net)桥接基于簇的深度分割和基于对抗基于对抗的异常/新奇检测算法的两个域。我们的ASC网络从正常和异常的医疗扫描中学到医疗扫描中的分段异常,没有任何掩盖的监督。我们在三个公共数据集中评估这一无监督的异常分段模型,即脑肿瘤细分的Brats 2019,肝脏病变分割和脑病变细分的MS-SEG 2015,以及脑肿瘤细分的私人数据集。与现有方法相比,我们的模型展示了无监督异常分段任务中的巨大性能增益。虽然与监督学习算法相比,仍有进一步提高性能的空间,但有希望的实验结果和有趣的观察揭示了使用用户定义的知识构建无监督学习算法的医疗异常识别。
translated by 谷歌翻译
通常假设基于深神经网络的分类器的培训和测试数据是从相同的分布采样的。当从远离训练样品的分布中抽出部分测试样品时(AKA分配(OOD)样本),培训的神经网络具有对这些ood的高信任预测的趋势样品。当培训用于图像分类的神经网络,对象检测等的神经网络时,检测是至关重要的。它可以提高分类器对无关投入的鲁棒性,并在不同形式的攻击下提高系统恢复力和安全性。检测OOD样品有三个主要挑战:(i)建议的OOD检测方法应与各种分类器的各种架构(例如,DENSENET,RESET)兼容,而不会显着提高模型复杂性和对计算资源的要求; (ii)ood样本可能来自多个分布,其类标签通常不可用; (iii)需要定义得分函数以有效地分离来自分布(IND)样本的OOD样本。为了克服这些挑战,我们提出了一种基于Wasserstein的分布式检测(木材)方法。基本思想是定义基于Wassersein-距离的评分,评估测试样品与IND样品的分布之间的异化。然后基于所提出的得分函数制定和解决优化问题。研究了所提出的方法的统计学习,以保证经验优化器实现的损耗值近似于全局最优。比较研究结果表明,所提出的木材始终如一地优于其他现有的ood检测方法。
translated by 谷歌翻译
记住和遗忘机制是人类学习记忆系统中同一硬币的两侧。灵感来自人类脑记忆机制,现代机器学习系统一直在努力通过更好地记住终身学习能力的机器,同时推动遗忘为敌人来克服。尽管如此,这个想法可能只能看到半张图片。直到最近,越来越多的研究人员认为,大脑出生忘记,即忘记是抽象,丰富和灵活的陈述的自然和积极的过程。本文通过人工神经网络积极遗忘机制提出了一种学习模型。主动遗忘机制(AFM)通过“即插即用”遗忘层(P \&PF)引入神经网络,由具有内部调节策略(IRS)的抑制神经元组成,以调整自己的消光率通过横向抑制机制和外部调节策略(ERS)通过抑制机制调节兴奋性神经元的消光速率。实验研究表明,P \&PF提供了令人惊讶的益处:自适应结构,强大的泛化,长期学习和记忆,以及对数据和参数扰动的鲁棒性。这项工作阐明了忘记学习过程的重要性,并提供了新的视角,了解神经网络的潜在机制。
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm. Relying on the tokenization process that splits inputs into multiple tokens, transformers are capable of extracting their pairwise relationships using self-attention. While being the stemming building block of transformers, what makes for a good tokenizer has not been well understood in computer vision. In this work, we investigate this uncharted problem from an information trade-off perspective. In addition to unifying and understanding existing structural modifications, our derivation leads to better design strategies for vision tokenizers. The proposed Modulation across Tokens (MoTo) incorporates inter-token modeling capability through normalization. Furthermore, a regularization objective TokenProp is embraced in the standard training regime. Through extensive experiments on various transformer architectures, we observe both improved performance and intriguing properties of these two plug-and-play designs with negligible computational overhead. These observations further indicate the importance of the commonly-omitted designs of tokenizers in vision transformer.
translated by 谷歌翻译
In recent years, vision-centric perception has flourished in various autonomous driving tasks, including 3D detection, semantic map construction, motion forecasting, and depth estimation. Nevertheless, the latency of vision-centric approaches is too high for practical deployment (e.g., most camera-based 3D detectors have a runtime greater than 300ms). To bridge the gap between ideal research and real-world applications, it is necessary to quantify the trade-off between performance and efficiency. Traditionally, autonomous-driving perception benchmarks perform the offline evaluation, neglecting the inference time delay. To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving. On the basis of the 2Hz annotated nuScenes dataset, we first propose an annotation-extending pipeline to generate high-frame-rate labels for the 12Hz raw images. Referring to the practical deployment, the Streaming Perception Under constRained-computation (SPUR) evaluation protocol is further constructed, where the 12Hz inputs are utilized for streaming evaluation under the constraints of different computational resources. In the ASAP benchmark, comprehensive experiment results reveal that the model rank alters under different constraints, suggesting that the model latency and computation budget should be considered as design choices to optimize the practical deployment. To facilitate further research, we establish baselines for camera-based streaming 3D detection, which consistently enhance the streaming performance across various hardware. ASAP project page: https://github.com/JeffWang987/ASAP.
translated by 谷歌翻译
Lobster eye telescopes are ideal monitors to detect X-ray transients, because they could observe celestial objects over a wide field of view in X-ray band. However, images obtained by lobster eye telescopes are modified by their unique point spread functions, making it hard to design a high efficiency target detection algorithm. In this paper, we integrate several machine learning algorithms to build a target detection framework for data obtained by lobster eye telescopes. Our framework would firstly generate two 2D images with different pixel scales according to positions of photons on the detector. Then an algorithm based on morphological operations and two neural networks would be used to detect candidates of celestial objects with different flux from these 2D images. At last, a random forest algorithm will be used to pick up final detection results from candidates obtained by previous steps. Tested with simulated data of the Wide-field X-ray Telescope onboard the Einstein Probe, our detection framework could achieve over 94% purity and over 90% completeness for targets with flux more than 3 mCrab (9.6 * 10-11 erg/cm2/s) and more than 94% purity and moderate completeness for targets with lower flux at acceptable time cost. The framework proposed in this paper could be used as references for data processing methods developed for other lobster eye X-ray telescopes.
translated by 谷歌翻译
Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed "X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP +6.5 mask AP on long-tail classes.
translated by 谷歌翻译
Generative adversarial networks (GANs) have made great success in image inpainting yet still have difficulties tackling large missing regions. In contrast, iterative algorithms, such as autoregressive and denoising diffusion models, have to be deployed with massive computing resources for decent effect. To overcome the respective limitations, we present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image, largely enhancing the inference efficiency. Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion. On multiple benchmarks, we achieve new state-of-the-art performance. Code is released at https://github.com/fenglinglwb/SDM.
translated by 谷歌翻译